AEMS: An Anytime Online Search Algorithm for Approximate Policy Refinement in Large POMDPs
نویسندگان
چکیده
Solving large Partially Observable Markov Decision Processes (POMDPs) is a complex task which is often intractable. A lot of effort has been made to develop approximate offline algorithms to solve ever larger POMDPs. However, even stateof-the-art approaches fail to solve large POMDPs in reasonable time. Recent developments in online POMDP search suggest that combining offline computations with online computations is often more efficient and can also considerably reduce the error made by approximate policies computed offline. In the same vein, we propose a new anytime online search algorithm which seeks to minimize, as efficiently as possible, the error made by an approximate value function computed offline. In addition, we show how previous online computations can be reused in following time steps in order to prevent redundant computations. Our preliminary results indicate that our approach is able to tackle large state space and observation space efficiently and under real-time constraints.
منابع مشابه
Online Policy Improvement in Large POMDPs via an Error Minimization Search
Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical framework for planning under uncertainty. However, most real world systems are modelled by huge POMDPs that cannot be solved due to their high complexity. To palliate to this difficulty, we propose combining existing offline approaches with an online search process, called AEMS, that can improve locally an appro...
متن کاملTheoretical Analysis of Heuristic Search Methods for Online POMDPs
Planning in partially observable environments remains a challenging problem, despite significant recent advances in offline approximation techniques. A few online methods have also been proposed recently, and proven to be remarkably scalable, but without the theoretical guarantees of their offline counterparts. Thus it seems natural to try to unify offline and online techniques, preserving the ...
متن کاملDESPOT: Online POMDP Planning with Regularization
POMDPs provide a principled framework for planning under uncertainty, but are computationally intractable, due to the “curse of dimensionality” and the “curse of history”. This paper presents an online search algorithm that alleviates these difficulties by focusing on a set of sampled scenarios. The execution of all policies on the sampled scenarios is captured in a Determinized Sparse Partiall...
متن کاملHeuristic Search Value Iteration for POMDPs
We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two well-known techniques: attention-focusing search heuristics and piecewise linear convex representations of the value function. HSVI’s soundness an...
متن کاملIncremental Sampling based Algorithms for State Estimation
Perception is a crucial aspect of the operation of autonomous vehicles. With a multitude of different sources of sensor data, it becomes important to have algorithms which can process the available information quickly and provide a timely solution. Also, an inherently continuous world is sensed by robot sensors and converted into discrete packets of information. Algorithms that can take advanta...
متن کامل